BDMS-626: Improve validation, schema alignment, and well inventory handling by ksmuczynski · Pull Request #596 · DataIntegrationGroup/OcotilloAPI

ksmuczynski · 2026-03-11T02:40:14Z

Why

This PR addresses the following problem / context:

Database mapping was incomplete: public_availability_acknowledgement, monitoring_status, well/water notes, and water level observations were not being persisted correctly.
Several schema fields were overly strict — requiring values that should be optional, using plain strings where lexicon enums were expected, and enforcing contact/water level fields more rigidly than necessary.
WellInventoryRow schema now integrates directly with Lexicon-based enums to ensure database consistency.
BDD tests suffered from primary key conflicts and outdated test data that didn't align with new validation rules.
Previously, a single malformed row in a CSV would abort the entire import process. Users need "best-effort" logic where valid data is saved while invalid rows are flagged.
Now, individual row failures (validation or database) no longer abort the entire ingestion.

How

Implementation summary - the following was changed / added / removed:

Database mapping:
- public_availability_acknowledgement now maps to Location.release_status (True → public, False → private, unset → draft).
- monitoring_status is now written to the StatusHistory table.
- well_notes and water_notes are now stored as polymorphic notes on the Thing.
- Providing depth_to_water_ft now auto-creates a Sample and Observation linked to the field activity.
WellInventoryRow schema:
- Made site_name, elevation_ft, elevation_method, measuring_point_height_ft, and depth_to_water_ft optional.
- Replaced plain str types with lexicon-based enums for elevation_method, depth_source, well_pump_type, monitoring_status, sample_method, and data_quality.
- Added a flexible_lexicon_validator for case-insensitive, whitespace-tolerant matching.
- Relaxed contact validation to require only one of contact_name or contact_organization, and made water_level_date_time only required when depth_to_water_ft is present.
- Added a schema regression test for blank depth_to_water_ft
- Added an import-level regression test that verifies a blank DTW still creates Sample and Observation rows with value is None
Best-effort logic:
- Implemented row-level atomic savepoints so individual row failures no longer abort the full import.
- Failed rows are logged with 1-based row numbers and well IDs in a validation_errors list.
- Fixed an UnboundLocalError caused by auto-generated well IDs
- Updated error reporting to surface database-level failures as "Database error" fields.
- Added commit=False support to allow services/thing_helper.py and services/contact_helper.py o participate in the outer best-effort transaction without prematurely committing.
BDD test suite:
- Updated test CSVs in tests/features/data/ to use valid lexicon terms and properly quoted comma-containing values.
- Added scenario-based unique ID suffixing in Given steps to isolate tests and prevent primary key conflicts.
- Updated well-inventory-csv.feature to assert partial success (e.g., "1 well is imported") in negative scenarios. All 44 scenarios now pass.

Notes

Any special considerations, workarounds, or follow-up work to note?

Further enhancements may be required for complete schema validation coverage.
Complete schema validation coverage will be scoped in a separate PR, most likely a future PR scoped at testing the ingestion of real user-entered data
Lexicon matching is intentionally lenient (case-insensitive, whitespace-stripped) to reduce friction for CSV authors, but values must still resolve to a known lexicon entry or the row will be skipped.
Test isolation is achieved via ID suffixing at the Given step level; if the shared test database is ever replaced with per-scenario teardown, this workaround can be removed.

- Introduced `validation_alias` with `AliasChoices` for selected fields (`well_status`, `sampler`, `measurement_date_time`, `mp_height`) to allow alternate field names. - Ensured alignment with schema validation updates.

- Introduced unit tests for `WellInventoryRow` alias mappings. - Verified correct handling of alias fields like `well_hole_status`, `mp_height_ft`, and others. - Ensured canonical fields take precedence when both alias and canonical values are provided.

… and new fields - Added `flexible_lexicon_validator` to support case-insensitive validation of enum-like fields. - Introduced new fields: `OriginType`, `WellPumpType`, `MonitoringStatus`, among others. - Updated existing fields to use flexible lexicon validation for improved consistency. - Adjusted `WellInventoryRow` optional fields handling and validation rules. - Refined contact field validation logic to require `role` and `type` when other contact details are provided.

…dations - Refined validation error handling to provide more detailed feedback in test assertions. - Adjusted test setup to ensure accurate validation scenarios for contact and water level fields. - Updated contact-related tests to validate new composite field error messages.

- Renamed "Water" to "Water Bearing Zone" and refined its definition. - Added new term "Water Quality" under `note_type` category.

…story updates

… to prevent cross-test collisions - Supports BDD test suite stability - Added hashing mechanism to append unique suffix to `well_name_point_id` for scenario isolation. - Integrated pandas for robust CSV parsing and content modifications when applicable. - Ensured handling preserves existing format for IDs ending with `-xxxx`. - Maintained existing handling for empty or non-CSV files.

…ollback side effects - Supports transaction management - Moved `session.refresh` calls under `commit` condition to streamline database session operations. - Reorganized `session.rollback` logic to properly align with commit flow.

…ory source fields in support of schema alignment and database mapping - Update well inventory CSV files to correct data inconsistencies and improve schema alignment. - Added support for `Sample`, `Observation`, and `Parameter` objects within well inventory processing. - Enhanced elevation handling with optional and default value logic. - Introduced `release_status`, `monitoring_status`, and validation for derived fields. - Updated notes handling with new cases and refined content categorization. - Improved `depth_to_water` processing with associated sample and observation creation. - Refined lexicon updates and schema field adjustments for better data consistency.

…h 1 well - Updated BDD tests to reflect changes in well inventory bulk upload logic, allowing the import of 1 well despite validation errors. - Modified step definitions for more granular validation on imported well counts. - Enhanced error message detail in responses for validation scenarios. - Adjusted sample CSV files to match new import logic and validation schema updates. - Refined service behavior to improve handling of validation errors and partial imports.

Copilot

Pull request overview

This PR updates the well-inventory CSV ingestion pipeline to better align schema validation with lexicon-backed enums, persist previously-missed fields (notes, monitoring/public availability, water level observations), and change the import behavior to “best-effort” so individual row failures don’t abort the entire upload.

Changes:

Added row-level savepoints and improved error reporting so invalid rows are skipped while valid rows are persisted.
Updated WellInventoryRow schema to relax/adjust requirements and introduce lexicon-backed enum parsing plus CSV field aliases.
Updated BDD/unit tests and test CSV fixtures to reflect new validation rules and partial-success behavior.

Reviewed changes

Copilot reviewed 34 out of 34 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
`services/well_inventory_csv.py`	Best-effort import via nested savepoints; mapping updates for release status, notes, monitoring status, and water level observation persistence
`schemas/well_inventory.py`	Schema optionality updates, lexicon enum coercion, contact + water level validation adjustments, and alias handling
`services/thing_helper.py`	Adds monitoring status history writes; adjusts commit/rollback behavior for outer-transaction support
`services/contact_helper.py`	Adjusts commit/rollback behavior for outer-transaction support
`cli/service_adapter.py`	Improves exit code and stderr reporting for partial success and validation failures
`tests/test_well_inventory.py`	Adds schema alias tests and helper row builder
`tests/features/well-inventory-csv.feature`	Updates expected partial-success outcomes in negative scenarios
`tests/features/steps/*.py`	Updates step definitions for partial success and loosens validation-error matching
`tests/features/data/*.csv`	Refreshes fixture data to match new lexicon/validation expectations
`core/lexicon.json`	Adds a new note_type term

tests/features/steps/well-inventory-csv-given.py

tests/features/steps/well-inventory-csv-validation-error.py

schemas/well_inventory.py

services/well_inventory_csv.py

tests/test_well_inventory.py

… unique well name suffixes in well inventory scenarios - Updated `pd.read_csv` calls with `keep_default_na=False` to retain empty values as-is. - Refined logic for suffix addition by excluding empty and `-xxxx` suffixed IDs. - Improved test isolation by maintaining scenario-specific unique identifiers.

…nd `DataQuality` - Changed `SampleMethodField` to validate against `SampleMethod` instead of `OriginType` - Changed `DataQualityField` to validate against `DataQuality` instead of `OriginType`

… import - Make contact.role and contact.contact_type nullable in the ORM and migrations - Update contact schemas and well inventory validation to accept missing values - Allow contact import when name or organization is present without role/type

- Stop round-tripping CSV fixtures through pandas to avoid rewriting structural test cases - Preserve repeated header rows and duplicate column fixtures so importer validation is exercised correctly - Keep the blank contact name/organization scenario focused on a single invalid row for stable assertions

…n errors - Prevent one actual validation error from satisfying multiple expected assertions (avoids false positives) - Keep validation matching order-independent while requiring distinct matches (preserves flexibility) - Tighten BDD error checks without relying on exact error text (improves test precision)

…behavior - Update partial-success scenarios to expect valid rows to import alongside row-level validation errors - Reflect current importer behavior for invalid lexicon, invalid date, and repeated-header cases - Keep BDD coverage focused on user-visible import outcomes instead of outdated all-or-nothing assumptions

…sitive parsing - Update unit expectations to accept lowercase placeholder tokens that are now supported - Document normalization of mixed-case and spaced placeholder formats to uppercase prefixes - Keep test coverage aligned with importer behavior and reduce confusion around valid autogen inputs

…DataQuality` - Adjust test data to reflect updated descriptions for `sample_method` and `data_quality` fields.

…ization scenarios - Add test to ensure contact creation returns None when both name and organization are missing - Add test to verify contact creation with organization only, ensuring proper dict structure - Update assertions for comprehensive validation of contact fields

…ates_v2

ksmuczynski · 2026-03-13T17:15:17Z

@jirhiker Recent updates to the CLI command tests are failing on my branch. Since these changes don't seem to impact the well inventory ingestion, should I resolve the test failures within this PR, or would you prefer I handle them in a separate one before re-opening for review?

…con category

…s for Sample and Observation - Replace `name_point_id` with `name` in `sample_name` generation - Rename `observation_*` fields for consistency with updated schemas

- Adjust `content.decode` to use `utf-8-sig` for correct header parsing of UTF-8 files with BOM - Prevent encoding issues when processing imported files

…h-to-water is blank - Treat blank depth_to_water_ft values as missing instead of invalid numeric input - Create water-level sample and observation records when water_level_date_time is present even if no depth value was obtained - Preserve attempted measurements for dry, obstructed, or otherwise unreadable wells without dropping the observation record

jacob-a-brown

Overall I think it's well done and well documented! I like your comments and how the code is organized

alembic/versions/p9c1d2e3f4a5_make_contact_role_nullable.py

alembic/versions/q0d1e2f3a4b5_make_contact_type_nullable.py

jacob-a-brown · 2026-03-16T20:24:54Z

schemas/thing.py

If monitoring_status is added in CreateWell then I think that it should also be a field in WellResponse

jacob-a-brown · 2026-03-16T20:28:25Z

schemas/thing.py

    is_suitable_for_datalogger: bool | None = None
    is_open: bool | None = None
    well_status: str | None = None
+    monitoring_status: str | None = None


This should be restricted to MonitoringStatus enum values

jacob-a-brown · 2026-03-16T20:39:10Z

schemas/well_inventory.py

                f" Zone={self.utm_zone}"
            )

+        if self.depth_to_water_ft is not None:


There should be a check here for mp_height, too, since it should be required for every observation.

jacob-a-brown · 2026-03-16T21:36:11Z

services/well_inventory_csv.py

+            else (model.sample_method or "Unknown")
+        )
+        sample = Sample(
+            field_activity_id=fa.id,


this should evaluate to the newly created "groundwater level" FieldActivity record, not the "well inventory" FieldActivity

jacob-a-brown · 2026-03-16T21:58:08Z

tests/features/steps/well-inventory-csv.py

+@then("{count:d} wells are imported")
+@then("{count:d} well is imported")
+def step_then_count_wells_are_imported(context: Context, count: int):
    response_json = context.response.json()
    wells = response_json.get("wells", [])
-    assert len(wells) == 0, "Expected no wells to be imported"
+    validation_errors = response_json.get("validation_errors", [])
+    assert (
+        len(wells) == count
+    ), f"Expected {count} wells to be imported, but got {len(wells)}: {wells}. Errors: {validation_errors}"


If partial steps are not allowed then I don't think that this step will be necessary anymore

jacob-a-brown · 2026-03-16T21:58:59Z

tests/test_well_inventory.py

+def _minimal_valid_well_inventory_row():
+    return {
+        "project": "Test Project",
+        "well_name_point_id": "TEST-0001",
+        "site_name": "Test Site",
+        "date_time": "2025-02-15T10:30:00",
+        "field_staff": "Test Staff",
+        "utm_easting": 357000,
+        "utm_northing": 3784000,
+        "utm_zone": "13N",
+        "elevation_ft": 5000,
+        "elevation_method": "Global positioning system (GPS)",
+        "measuring_point_height_ft": 3.5,
+    }


Since this is reusable I think that it should be a pytest fixture

jacob-a-brown · 2026-03-16T22:03:51Z

tests/test_well_inventory.py

+        {
+            "water_level_date_time": "2025-02-15T10:30:00",
+            "depth_to_water_ft": "",
+            "sample_method": "Steel-tape measurement",
+            "data_quality": "Water level accurate to within two hundreths of a foot",
+            "water_level_notes": "Attempted measurement",
+            "mp_height_ft": 2.5,
+        }


This is a style comment, but if you put this dictionary in a variable you access the values for subsequent assert statements. Then if something changes in the future you only need to update the key-value pair and the assert should hold

jacob-a-brown · 2026-03-16T22:14:53Z

tests/test_well_inventory.py

+            {
+                "measuring_person": "Tech 1",
+                "sample_method": "Steel-tape measurement",
+                "water_level_date_time": "2025-02-15T10:30:00",
+                "mp_height_ft": 2.5,
+                "level_status": "Static",
+                "depth_to_water_ft": 11.2,
+                "data_quality": "Water level accurate to within two hundreths of a foot",
+                "water_level_notes": "Initial reading",
+            }


See above style comment about setting the dictionary to a variable

- Use detailed error messages from `DatabaseError` for better debugging

…rganization terms - Treat blank contact organization and well status values as missing instead of persisting empty strings - Prevent foreign key failures caused by empty organization and status lexicon references during import - Add newly encountered organization terms to the lexicon so valid contact records can persist successfully

- Detect previously imported well inventory rows before inserting related records - Skip recreating field activity water-level samples and observations when the same row is reprocessed - Return serializable existing-row results so CLI reruns report cleanly instead of crashing

…e-database-errors kas-BDMS-626-resolve-database-errors

Each contact should have a role and contact_type

…tailed DB errors

Jab bdms 626

ksmuczynski added 11 commits March 7, 2026 15:10

feat(schemas): add alias validation for well inventory fields

1e0b253

- Introduced `validation_alias` with `AliasChoices` for selected fields (`well_status`, `sampler`, `measurement_date_time`, `mp_height`) to allow alternate field names. - Ensured alignment with schema validation updates.

feat(core): expand lexicon with new terms for water-related categories

4d74d1b

- Renamed "Water" to "Water Bearing Zone" and refined its definition. - Added new term "Water Quality" under `note_type` category.

feat(schemas): add monitoring_status field to thing schema

a7e0632

feat(thing_helper): add handling for monitoring_status in status hi…

42bae2d

…story updates

ksmuczynski requested review from Copilot, jacob-a-brown and jirhiker March 11, 2026 02:40

Copilot started reviewing on behalf of ksmuczynski March 11, 2026 02:40 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

ksmuczynski mentioned this pull request Mar 11, 2026

Update well inventory CSV feature header expectations #595

Open

ksmuczynski marked this pull request as draft March 11, 2026 02:53

ksmuczynski added 11 commits March 11, 2026 14:59

Merge staging

c350555

fix(schemas): fix well inventory schema mismatch for SampleMethod a…

9742c03

…nd `DataQuality` - Changed `SampleMethodField` to validate against `SampleMethod` instead of `OriginType` - Changed `DataQualityField` to validate against `DataQuality` instead of `OriginType`

test(well-inventory): update expected values for SampleMethod and `…

ad86bf6

…DataQuality` - Adjust test data to reflect updated descriptions for `sample_method` and `data_quality` fields.

Merge branch 'staging' into kas-well-BDMS-626-inventory-ingestion-upd…

8c9ea27

…ates_v2

ksmuczynski added 4 commits March 16, 2026 11:54

refactor(enums): update MonitoringStatus to use status_value lexi…

7143ed3

…con category

refactor(well-inventory): update field mappings and naming convention…

8e583ea

…s for Sample and Observation - Replace `name_point_id` with `name` in `sample_name` generation - Rename `observation_*` fields for consistency with updated schemas

fix(cli): handle UTF-8 BOM in CSV decoding for well inventory import

a7bad53

- Adjust `content.decode` to use `utf-8-sig` for correct header parsing of UTF-8 files with BOM - Prevent encoding issues when processing imported files

jacob-a-brown reviewed Mar 16, 2026

View reviewed changes

ksmuczynski and others added 12 commits March 17, 2026 10:22

fix(well-inventory): improve error handling for database exceptions

4f2b3cd

- Use detailed error messages from `DatabaseError` for better debugging

Merge pull request #605 from DataIntegrationGroup/kas-BDMS-626-resolv…

a4e9412

…e-database-errors kas-BDMS-626-resolve-database-errors

fix(test): encore ocotilloapi_test for bdd tests

1e0fd84

feat(test): print exit_code when assert fails

3ad295a

fix(contact): Make contact role and type non-nullable

e768d8a

Each contact should have a role and contact_type

fix(test): remove print debugging statement

a0ea88d

fix(well inventory): extract role/contact_type from enum

0a30676

fix(test): ensure different step test names

0fada74

test(well-inventory): align invalid well_hole_status scenario with de…

965bcc7

…tailed DB errors

Merge pull request #607 from DataIntegrationGroup/jab-bdms-626

bf65262

Jab bdms 626

Conversation

ksmuczynski commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

How

Notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ksmuczynski commented Mar 13, 2026

Uh oh!

jacob-a-brown left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ksmuczynski commented Mar 11, 2026 •

edited

Loading